Adding different types of parallelism to the elementwise layer#222
Conversation
| EWLayerImpl() = delete; | ||
| EWLayerImpl(const Shape& shape, std::string function, float alpha = 0.0F, | ||
| float beta = 0.0F); | ||
| float beta = 0.0F, int type_parall = 0); |
There was a problem hiding this comment.
Use a strongly-typed backend enum instead of int for readability and safety.
enum class ParBackend { Seq = 0, Threads = 1, TBB = 2, OMP = 3 };
There was a problem hiding this comment.
Propagate ParBackend through API instead of raw int
| int available_threads = -1; | ||
| if (type_parall_ == 0) available_threads = 1; | ||
| if (type_parall_ == 1) | ||
| available_threads = std::thread::hardware_concurrency(); | ||
| if (type_parall_ == 2) | ||
| available_threads = oneapi::tbb::info::default_concurrency(); | ||
| if (type_parall_ == 3) available_threads = omp_get_max_threads(); |
There was a problem hiding this comment.
Please wrap common function for getting thread number
| @@ -1,5 +1,11 @@ | |||
| #pragma once | |||
| #include <omp.h> | |||
There was a problem hiding this comment.
Guard the OpenMP/TBB includes and add ; otherwise non-OpenMP builds fail.
#ifdef HAS_OPENMP
#include <omp.h>
#endif
#include <thread>
#ifdef HAS_TBB
#include <oneapi/tbb/blocked_range.h>
#include <oneapi/tbb/parallel_for.h>
#include <oneapi/tbb/info.h>
#endif
| EWLayerImpl() = delete; | ||
| EWLayerImpl(const Shape& shape, std::string function, float alpha = 0.0F, | ||
| float beta = 0.0F); | ||
| float beta = 0.0F, int type_parall = 0); |
There was a problem hiding this comment.
Propagate ParBackend through API instead of raw int
| }; | ||
|
|
||
| template <typename Func> | ||
| inline void parallel_for(int count, Func func, int mode = 0) { |
There was a problem hiding this comment.
| inline void parallel_for(int count, Func func, int mode = 0) { | |
| inline void parallel_for(int count, Func func, int mode = 0) { | |
| if (count <= 0) return; |
| @@ -1,5 +1,11 @@ | |||
| #pragma once | |||
| #include <omp.h> | |||
There was a problem hiding this comment.
Move all backend headers (OpenMP/TBB/Threads) and implementation details into a small parallel module. Expose a single, inline header API so call sites incur no extra call/indirection.
- include/parallel/parallel.hpp (inline API)
- include/parallel/backends.hpp (backend helpers; guarded includes)
- No <omp.h>/TBB headers leaking into layer headers.
Example:
// include/parallel/parallel.hpp
#pragma once
#include <cstddef>
enum class ParBackend { Auto, Seq, Threads, TBB, OMP };
struct ParOptions {
ParBackend backend = ParBackend::Auto;
int max_threads = 0; // 0 = runtime default
std::size_t min_parallel_n = 4096; // small tasks stay sequential
std::size_t grain = 1024; // backend-specific chunk hint
};
// Header-only: one branch + inlined backend
template <class F>
inline void parallel_for(std::size_t n, F&& f, const ParOptions& opt) {
if (n == 0) return;
const ParBackend b = select_backend(opt, n); // inline, cheap
switch (b) {
case ParBackend::Seq: return impl_seq(n, f);
case ParBackend::Threads: return impl_threads(n, f, opt);
case ParBackend::TBB: return impl_tbb(n, f, opt);
case ParBackend::OMP: return impl_omp(n, f, opt);
case ParBackend::Auto: return impl_seq(n, f); // unreachable
}
}
There was a problem hiding this comment.
but not add auto
| @@ -1,5 +1,11 @@ | |||
| #pragma once | |||
| #include <omp.h> | |||
|
|
|||
There was a problem hiding this comment.
Avoid re-evaluating “Auto” logic every call. Resolve once (feature flags + environment + problem size) and cache in the layer/context.
// Called once per layer or first use
inline ParBackend resolve_auto_once(const ParOptions& opt, std::size_t n) noexcept {
#if defined(HAS_OMP)
if (n >= opt.min_parallel_n) return ParBackend::OMP;
#elif defined(HAS_TBB)
if (n >= opt.min_parallel_n) return ParBackend::TBB;
#elif defined(HAS_THREADS)
if (n >= opt.min_parallel_n) return ParBackend::Threads;
#endif
return ParBackend::Seq;
}
inline ParBackend select_backend(const ParOptions& opt, std::size_t n) noexcept {
if (opt.backend != ParBackend::Auto) return opt.backend;
static ParBackend cached = resolve_auto_once(opt, n); // or store in the layer
return cached;
}
aobolensk
left a comment
There was a problem hiding this comment.
I actually think we can leave remaining solution basically as is. The problem with OpenMP slowdown is actually reproducible, but I suggest to focus on parallel_for itself. Anyway, this effect is not that visible on matrix multiplication workloads. For further investigation we will take a look at the compilation details (which code it has been lowered to). For now we can proceed as is
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #222 +/- ##
==========================================
+ Coverage 83.09% 83.18% +0.09%
==========================================
Files 44 46 +2
Lines 2271 2349 +78
Branches 1349 1397 +48
==========================================
+ Hits 1887 1954 +67
- Misses 187 190 +3
- Partials 197 205 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| if (NOT WIN32) | ||
| set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wextra -Werror") | ||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Werror") | ||
| # if (NOT WIN32) |
There was a problem hiding this comment.
Please, revert. Flags are still required
There was a problem hiding this comment.
I tried to uncomment them, but then I get errors
| end = std::chrono::high_resolution_clock::now(); | ||
| total_duration = | ||
| std::chrono::duration_cast<std::chrono::milliseconds>(end - start); | ||
| std::cout << "TBB notmatrix: " << total_duration.count() << " ms" |
|
Proposed fix for the Linux OpenMP failure: if(NOT WIN32)
find_package(OpenMP)
endif()
if(OpenMP_FOUND)
message(STATUS "OpenMP found - enabling parallel support")
add_compile_definitions(HAS_OPENMP)
link_libraries(OpenMP::OpenMP_CXX)
else()
message(STATUS "OpenMP not found - parallel features disabled")
endif()Notes:
|

No description provided.